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Abstract 

Elliptic curves over finite fields ¥2™ play a prominent role in modern cryptography. Published quan- 
tum algorithms dealing with such curves build on a short Weierstrass form in combination with affine or 
projective coordinates. In this paper we show that changing the curve representation allows a substantial 
reduction in the number of T-gates needed to implement the curve arithmetic. As a tool, we present a 
quantum circuit for computing multiplicative inverses in ¥2™ in depth O(nlogn) using a polynomial 
basis representation, which may be of independent interest. 

1 Introduction 

Binary elliptic curves form an especially important family of groups for cryptographic applications, and 
the implementation of their addition law in a quantum circuit has been studied by a number of authors 
ifTTl [T3l . To the best of our knowledge, in all these discussions the representation used for elliptic curves 
is a short Weierstrass form in combination with affine or projective coordinates. While this is a natural 
choice, restricting to such representations does not exploit the available technical machinery — there is a 
substantial body of work on how to optimize elliptic curve arithmetic on classical hardware architectures 
(cf. H), and one may hope that some of these classical results allow for simplification at the circuit level 
when implementing binary elliptic curve arithmetic in a quantum circuit, e. g., when trying to find discrete 
logarithms J2TJ. For an actual implementation, the number of T-gates needed to implement such a circuit 
is particularly of interest and it is desirable to keep this number as small as possible. The reason for this 
is that for most fault-tolerant quantum computing schemes, the implementation of T-gates is achieved via 
so-called magic state distillation j6j|7J[T8l, a process which is costly in terms of physical resources required. 
For instance, in the case of the surface code, it is reasonable to assume that a single T-gate has a cost that 
is about 100 times higher than a single CNOT Q. While minimizing the total number of T-gates is the 
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prime objective of circuit synthesis at the logical level, the total depth of the computation when arranged as 
an alternation between T-gates and Clifford gates (the so-called "T-depfh") is also an important parameter. 
It is desirable to keep the T-depth low by parallelizing T-gates as much as possible. 

Our contribution. Below, we show how changing the curve representation can help to reduce the number 
of T-gates needed to implement elliptic curve arithmetic — and in addition help to reduce the circuit depth. 
The quantum circuit we present makes use of point addition formulae suggested by Higuchi and Takagi Q 
and can in particular be used to reduce the number of gates as well as the depth, in comparison to the use of 
ordinary projective coordinates (cf. |[T3lO . 

Some applications of elliptic curves may require unique representations of curve points (cf. |[T3lO . When 
dealing with representations for fast arithmetic, deriving a unique point representation may involve an in- 
version in the underlying finite field. In a polynomial basis representation, a quantum implementation of 
the extended Euclidean algorithm can be used for this inversion, however the circuit has 0(n 3 ) gates and 
quadratic depth |[TTl[T4l[T3l . For other field representations, an inversion algorithm with depth O(nlogra) 
and O (n 2 log n) gates has been proposed [ 1 ] . In order to compute unique point representations using a poly- 
nomial basis more efficiently, we adapt the approach from [1] to the polynomial basis setting. In this way 
we obtain the first published quantum circuit using a polynomial basis representation which can compute 
inverses in F?m in depth 0(n log n) with 0(n 2 log n) gates. 

2 Fixing a finite field representation 

Fast addition formulae for points on an elliptic curve over a finite binary field ¥2^ aim at reducing the 
number of (expensive) ¥2" -operations. The following operations are of particular interest: 

Addition: Given a, f3 G F2», compute their sum a + P- 

Multiplication: Given a, j3 G F2™, compute their product a ■ j3. 

Multiplication with a constant: For a fixed non-zero constant 7 G F^n, on input a G F2«, compute 7 • a. 
The value 7, for example, could be a coefficient in the defining equation of an elliptic curve. 

Squaring: Given a G ¥%n, compute a 2 . 

If one is interested in a unique representation of curve points, then the inversion of F 2 n -elements also comes 
into play. 

Inversion: Given a G Fj^, find a" 1 G F 2 ". 

The specific cost of each operation depends on how the field ¥2^ is represented, and in the next two sections 
we look at three representations that have been considered in the literature on quantum circuits. 

2.1 Polynomial basis representation 

In a polynomial basis representation, F 2 n is identified with a quotient F2 [£]/(/) where / G F 2 [x] is an 
irreducible polynomial of degree n. Each a G F 2 « is represented by the unique sequence (ao, • • • , a n -i) G 
Fr, with q = Y11=q x * + (/)■ m a quantum circuit, we store each coefficient a,i in a separate qubit. Quantum 
arithmetic in such a representation has been explored by a number of authors, including Beauregard et al. 
0, Kaye and Zalka ifTD . and Maslov et al. fl3l . For each of the four basic tasks mentioned above, the 
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exact implementation complexity varies depending on the particular choice of / and efficient circuits are 
available: 

Addition: As addition is defined coefficient-wise, n CNOT gates are sufficient to derive the representation 
of a + (3 from those of a and f3. These gates operate on disjoint wires and can be implemented in 
depth 1. To realize an addition |a) \j3) |0) i — ^ |«) \P) \a + (3) where the sum is stored in a separate 
register, we can first add \a) to |0), followed by adding |/3), i. e., 2n CNOT gates and depth 2 suffice. 
In particular, we do not need a single T-gate to implement F2« -addition. 

Multiplication: Building on a classical Mastrovito multiplier lfI31[T6l[T9l , in lfi~3l a linear depth quantum 
circuit is presented which derives the product a ■ (3 from a, [3 G F2«. This circuit requires n 2 Toffoli 
gates and n 2 — I CNOT gates. In particular, the T-gate complexity of a full F2^ -multiplication is quite 
substantially 

Multiplication with a constant: Fix 7 € F^n. As multiplication with 7 is F2 -linear, invoking a general 
multiplier is not necessary. Instead, we can realize multiplication by 7 as a matrix-vector multipli- 
cation with a suitable non-singular matrix T. An LC/P-decomposition of V immediately yields a 
depth 2n circuit that is comprised of no more than n 2 + n CNOTs. No Toffoli gates are needed. 

Squaring: No dedicated quantum circuit to implement the squaring map \a) |0) 1 — >| ck) \a 2 ) has been 
proposed, but as squaring in F2™ is F2-linear, it is enough to implement a matrix-vector multiplication 
in depth 2n using no more than n ■ (n + 1) = n 2 + n CNOTs. No Toffoli gates are needed. 

Summarizing, among the above mentioned four basic operations, only the general multiplication involves 
T-gates, and their number unfortunately grows quadratic in the extension degree n. In cryptographic appli- 
cations of elliptic curves, values of n > 160 are common. Hence, if we can save a general F2™ -multiplication 
at the expense of some additions, squarings or constant multiplications, this can be of great value for the 
implementor of a quantum circuit. 

So far, our discussion has ignored the inversion operation. The current literature offers only a circuit 
with a cubic number of gates and quadratic depth [11], making the two representations discussed in the next 
section seemingly more attractive for inversion. However, in Section [231 below, we will show that both the 
cubic gate complexity and the quadratic depth of this operation can be avoided by adapating the inversion 
technique used in [1] to the polynomial basis setting. 

2.2 Gaussian normal basis and ghost-bit basis representations 

Aiming for a more efficient inversion algorithm, in HI two field representations are considered that differ 
from the polynomial basis representation just discussed: a ghost-bit basis and a Gaussian normal basis 
representation. For the purposes of this paper it is not necessary to discuss their technical details, and we 
restrict to looking at the cost of the relevant arithmetic operations: 

Addition: With a Gaussian normal basis, addition can be performed in the same way as with a polynomial 
basis. If a ghost-bit basis is available, elements in F2™ are represented with n + 1 bits, resulting 
again in two approaches for the addition. One approach is to add \a) to \(3) yielding one additional 
CNOT gate and a depth 1 circuit. The other approach is to add |a) followed by |/3) to |0) yielding 
two additional CNOT gates and a depth 2 circuit. Apart from these details, the addition operation is 
exactly the same as when using a polynomial basis representation. 

'With a realization of [2], a Toffoli gate can be implemented without ancillae with seven T-gates (or T' -gates which we assume 
to have the same cost) in a circuit that has a T-depth of 3. 
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Multiplication: If a ghost-bit basis is available, the multiplication a ■ (3 of two field elements a, /3 € ¥2^ 
can be realized in depth n + 1 using (n + l) 2 Toffoli gates. 

With a Gaussian normal basis of type t, a quantum circuit of depth (t + (i mod 2)) • n — 1 involving 
(t + (t mod 2)) ■ n 2 — n Toffoli gates is available for multiplying two elements in ¥2^. 

Multiplication with a constant: Choosing the matrix T in accordance with the Gaussian normal basis or 
the ghost-bit basis, we can proceed as in the case of a polynomial basis. For a Gaussian normal basis 
this yields a circuit with n 2 + n CNOTs, and as a result of the extra bit used in a ghost-bit basis, for 
the latter we obtain a quantum circuit comprised of (n + 1) • (n + 2) = n 2 + 3n + 2 CNOT gates. No 
Toffolis are needed. 

Squaring: This operation is for free since the square of a field element can be obtained by simply reading 
the coefficient vector in permuted order. Hence, no gates are required to implement the squaring 
operation and we require n respectively n + 1 CNOTs, all operating in parallel, to implement the map 

\a) |0) ^| Q ) |q 2 ), . 

Again, in terms of T-gate complexity, multiplication is the dominating operation, and the number of squaring 
operations in formulae for fast elliptic curve addition can be expected to be quite small. Consequently, using 
a polynomial basis representation looks preferable, even if the particular extension degree of interest affords 
a Gaussian normal basis of small type. 

However, taking the computation of inverses into account — an operation that occurs in the derivation of 
a unique representation of a curve point — the situation seems to become more involved: In [fl] an inversion 
circuit of depth O(nlogn) involving 0(n 2 logn) gates has been presented. Compared to the quadratic 
depth and cubic gate complexity of the best published inversion circuit using a polynomial basis [11], this 
looks quite attractive. While ifTTI builds on Euclid's algorithm, [1] builds on a classical technique by Itoh 
and Tsujii iPTOl . which exploits that an efficient squaring algorithm is available. As mentioned, in the case 
of a Gaussian normal basis or a ghost-bit basis representation, the squaring operations in a quantum circuit 
are actually for free. To overcome the cubic gate complexity and quadratic depth requirements of inversion 
using a polynomial basis, the next section shows how to apply Itoh and Tsujii's algorithm with a polynomial 
basis. 



2.3 Itoh-Tsujii inversion with a polynomial basis representation 

Let a G F 2 ™ be non-zero. As a -1 = a 2 ™ -2 , the inverse of a can be computed through exponentiation. 
Itoh and Tsujii proposed a particularly efficient method to compute this power (see IfTUl l23l l20l HI), if the 
squaring operation in F 2 ™ is inexpensive. The quantum circuits for inversion in HI use exactly this technique 
when working with a field representation where squaring is just a permutation of the coefficient vector. Here 
we want to show that even with a polynomial basis, this approach is a very attractive alternative to Euclid's 
algorithm. To describe Itoh and Tsujii's approach, it is convenient to introduce some notation: for i > we 
define = a 2 ' -1 . Then our goal is to find a -1 = (/3 n __i) 2 from f3\ = a. For this we exploit that 

= pi ■ 0f (1) 

for all i,j > 0. Writing n - 1 = ES™ - ^ ^ with L!og 2 (™ - l)\ = h > k 2 > ■ ■ ■ > A; hw(n _ 1) > 0, Itoh 
and Tsujii's strategy to find a -1 can be summarized in three steps: 

(I) Repeatedly apply Equation (Q} with i = j to find all of /3 2 o , /3 2 i , . . . , P 2 k i • 
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(II) Use Equation © to find /3 2 k 1+2 k 2 , f3 2 k 1+2 k 2+2 k 3 /8 2 k 1+2 fc a+ ... +2 fc ta r( n - 1 ) (= Pn-i)- 
(III) Compute a" 1 = ((3 n -i) 2 - 

Computing a value from given values /3j, j3j by means of Equation (Q]) involes one multiplication and an 
exponentiation by a fixed power of 2. As mentioned in Section 1270 the multiplication can be implemented 
with n 2 Toffolis plus n 2 — 1 CNOT gates in a quantum circuit of depth O(n). Differing from the situation 
in CD, the exponentiation with 2 l is not for free, but as the map £ i-4 is F2 -linear and bijective, we can 
implement it as a matrix- vector multiplication with a suitable non-singular nxn matrix having entries in F2. 
Thence, using an LUP-decomposition of this matrix, the needed exponentiation can be realized with n 2 + n 
CNOT gates in depth 2n. Summarizing, we see that in a polynomial basis representation, one evaluation of 
Equation (OQ) can be realized in depth 0(n) using n 2 Toffolis and 2n 2 + n — 1 CNOT gates. 

Step (I) in the above procedure requires |_log 2 (n — 1)J — 1 evaluations of Equation (GQ), i.e., this step 
can be realized in depth 0(n log 2 n) by means of ( [k>g 2 (n — 1)J — 1) • n 2 Toffolis and 0(n 2 log n) CNOT 
gates. In Step (II), performing hw(n — 1) — 1 evaluations of Equation {T]) sequentially, we obtain a depth 
of O(nlogn), involving (hw(n — 1) — 1) • n 2 Toffolis and 0(ra 2 logn) CNOT gates. Step (III) is just a 
matrix- vector multiplication with a suitable non-singular nxn matrix, and using an LUP-decomposition of 
the latter, a quantum circuit with no more than n 2 + n CNOT gates can realize this squaring in depth 2n. 

To 'uncompute' ancilla, we run the complete circuit — with exception of the final squaring — 'backwards' 
and obtain the following: 

Proposition 2.1. In a polynomial basis representation, a~ l , the inverse of an element a G F 2 «, can be 
computed in depth 0(n log 2 (n)) using 2 • ( [log 2 (n — 1)J + hw(n — 1) — 2) • n 2 = 0(n 2 log n) Toffolis and 
0(n 2 logra) CNOT gates. This includes the cost for cleaning up ancillae. 

Remark 2.1. Organizing the computation of /3 n -l m Step (II) in a tree structure, the circuit depth for this 
step can be reduced to O(reloglogn), but because of Step (I), for the overall depth of the inverter we still 
obtain the bound 0(n log 2 n). 

Even though the squaring operation is not for free, in terms of T-gate complexity, this inverter seems 
quite competitive to the ones presented in [ 1 ] for ghost-bit and Gaussian normal basis representations. 
Thence, in the remainder of this paper we assume that a polynomial basis representation of the underly- 
ing field F 2 n is used. 

3 Binary elliptic curves 

Let n G N be a positive integer and F 2 « a finite field of size 2 n . For cryptographic applications, typical 
values are n G {163, 233, 283} [ 17 ]. Perhaps the most common representation of ordinary elliptic curves in 
characteristic 2 is a short Weierstrass form, given by a polynomial in F 2 n [x,y]: 

y 2 + xy = x 3 + a 2 x 2 + a 6 (2) 

Here a 2 , a§ G F 2 n, with a§ ^ 0, and for practical purposes one often has a 2 G {0, 1} (cf. ifTTl ). We write 

E a2 ,a 6 (F 2 n) := {(u,v) G F 2 n : v 2 + uv = u 3 + a 2 u 2 + a 6 } U {O} 

for the (F 2 n -rational points on the) elliptic curve given by Equation ©. The point O G E a2j(l6 (F 2 n) corre- 
sponds to the 'point at infinity.il Because of a§ ^ 0, we have (0, 0) E a2 ^ a6 (F 2 n), suggesting (0, 0) G F 2 „ 

2 More technically, O is the unique point that is obtained when passing to the projective closure of F, a2 ,a 6 - 
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as convenient representation of O. Hence, each curve point can be naturally represented as a pair of two field 
elements (which fit into 2n qubits). The elliptic curve E a2i06 (F2«) is equipped with a natural group struc- 
ture, where O serves as the identity. Namely, for Pi = [x\, y\) and Pj = (^2, 2/2). their sum P% = P\ + P2 
can be computed by the procedure in Figure [Q which is taken from j22l . 

if Pi = O then return P 2 
\\P 2 = then return Pi 

if x\ = X2 then if y\ + y2 = £2 
then return O 
else A <r- x 2 + U2/x2 
x 3 <- X 2 + A + a 2 
y 3 ^x 2 2 + {X + l)x 3 
else A <- (yi + y 2 )/(xi + x 2 ) 
x 3 «- A 2 + A + x\ + X2 + a 2 
V3 <~ {x2 + x 3 )X + x 3 + y 2 

return (x 3 ,y 3 ) 

Figure 1: adding two points on the elliptic curve y 2 + xy = x 3 + 02 x 2 + a§ 
3.1 Choosing a curve representation: the cost of adding a fixed point 

Before looking at the task of implementing a general point addition P\+ P2, it is worthwhile to consider the 
special case when Pi ^ O ^ P2, P\ ^ ±P2, and P2 is a fixed point. In a discrete logarithm computation 
as discussed in iPTTl [T3l . this is the only case needed, i.e., only the very last case of the addition law in 
Figure Q] needs to be taken into account. Still, when using affine coordinates, the addition law involves an 
inversion in ¥2^ and as indicated by the discussion in Section[2j this inversion operation is typically (much) 
more expensive to implement than addition or multiplication in F2». Therefore, relying on a projective 
formulation of the group law is a natural choice when designing quantum circuits. In projective coordinates, 
each (x,y) G E a2)a6 (F 2 ™) \ {O} is represented by a triple (X,Y,Z) G F^„ such that X/Z = x and 
Y/Z = y, and O is represented by a triple (0, Y, 0) G F| n with Y ^ 0. These triples are only unique up to 
multiplication with a non-zero element in F2™ - Maslov et al. lfT3ll exploit this freedom to restrict the number 
of of finite field inversion circuits in a discrete logarithm computation. In particular, they observe that as 
long as such a (non-unique) projective representation is sufficient, the addition of a constant curve point can 
be realized in linear depth. 

To the best of our knowledge, no detailed (gate-level) analysis of how to add a fixed point on an elliptic 
curve has been published. Subsequently we note that — even with a clever implementation of projective 
coordinates — the T-gate complexity of such a quantum circuit can be reduced substantially by passing to 
a different curve representation. As a welcome aside, it seems that simultaneously the circuit depth can be 
brought down. 

3.1.1 Mixed addition with projective coordinates 

For the fixed point that is to be added, one can assume an affine representation is available leaving no need to 
handle a general 'Z-coordinate' for this operand. So using projective coordinates, a natural (non-trivial) way 
to implement the addition of a fixed point is to apply the madd-2008-bl formulae from [4]: with the curve 



# Pi = -P 2 
# Pi = P 2 

# Pi + ±P 2 
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parameter 02 as in Equation (O these formulae derive a projective representation (X 3 , Y 3 , Z 3 ) of Pi + P2 
with twelve ¥2^ -multiplications, three of them having one operand fixed (namely, one operand is X2, 1/2 or 
02), seven ¥2" -additions, and one squaring. 



Translating these formulae one by one immediately yields a quantum circuit in which the number of Toffolis, 
respectively T-gates, is determined by the nine general ¥2" -multiplications. To reduce the circuit depth, 
we can try to parallelize some of the computations. Adding some CNOT gates to create 'work copies' of 
intermediate results, we can enable parallelization without increasing the number of T-gates. To characterize 
the complexity of the resulting quantum circuit, we write Dm(h>) for the depth of an ¥2™ -multiplier 



and Gm(b) for the number of gates required to implement such a multiplier. Further, we write D^(n) 
for the T-depth of an F2™ -multiplier and G^(n) for the number of T-gates required to implement such a 
multiplier. We assume that Dm(ti), Gm{ji), Dj 4 (n), and G\ { (n) include the cost for cleaning up ancillae. 
Squaring operations and multiplications with a non-zero constant can be implemented with no more than 
n 2 + n CNOT gates in depth 2n each. As a functional composition of squarings and multiplications by a 
non-zero constant can be combined into a single invertible F2-linear map (through matrix multiplication), 
any fixed functional composition of squarings and non-zero constant multiplications can be implemented in 
depth 2n with n 2 + n CNOT gates as well. 

Proposition 3.1. The point addition \X X ) \Y X ) \Z X ) |0) |0) |0) — y\X x ) \Y X ) \Z X ) \X 3 ) \Y 3 ) \Z 3 ) can be 
implemented in overall depth 6DM(n) plus 8n + 0(1) (the latter accounting for CNOT gates), and T-depth 
6D^(n). Further, a total of VSGm{ji) gates and 8n 2 + 0(n) CNOT gates suffice. The total number of 
T-gates is \5GT,(n). This includes the cost for cleaning up ancillae. 

Here (X 3 , Y 3 ,Z 3 ) is some projective representation of P x + P2 and P2 € E a2 a6 (F2«) a fixed point, 
represented with affine coordinates (X2, 2/2)- 

Proof: To implement the madd-2008-bl formulae we can proceed as follows: 

1. Create a 'work copy' Z[ of Z x using n CNOT gates, all of which operate in parallel. Then compute 
Z x ■ y2 and Z[ ■ %2 in parallel and store these values in separate (| 0} -initialized) registers, using 
2 • (n 2 + n) CNOT gates and depth 2n. 

2. Using 2n CNOT gates, all operating in parallel, add Y x to Z x ■ 1/2 and add X x to Z[ ■ X2, so that those 
registers now hold A and B respectively. Using 2n additional CNOT gates and increasing the circuit 
depth by 2, we can also store AB = A + B in a new (|0) -initialized) register. Moreover, using 2n 
CNOT gates, we can in constant depth provide 'work copies' A' of A and B' of B. 

3. Using n 2 + n CNOT gates, we can now compute C = B 2 in depth In. If 0,2 7^ 0, with no more than 
n 2 + n additional CNOT gates we can in parallel determine 0,2 ■ (B') 2 . 

4. Using four multiplication circuits that operate in parallel, we can now compute E = B ■ C , A ■ AB, 
A' ■ X x and B' ■ Y x in depth %(n), using 4 • G^ra) gates. 



A = Y x + Z x -y 2 , B = X x + Z x -x 2 , AB 
C = B 2 , E = B ■ C, F 



= A + B, 

= (A-AB + a 2 -C)-Z 1 +E, 



X 3 = B-F, 

Y 3 = C-{A-X 1 + B-Y 1 ) + AB-F, 
Z 3 = E ■ Z x . 



\a) |/3) \0^\a) |/3) |£ + a/3) 
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5. Next, using < 2n CNOT gates that operate in parallel we can add A' ■ X\ to B' ■ Y\ and — if a 2 7^ — 
A ■ AB to a 2 ■ (B'f. 

6. With three general F 2 n -multipliers we can now compute (A-AB+a 2 -(B') 2 )-Z[, C-(A' -Xi+B' -Y\), 
Z 3 = E ■ Z\ and store these values in new registers. For this, depth DM{n) and 3 • Gm{^) gates 
suffice. 

7. By adding (A • AB + a 2 • C) ■ Z[ to E we obtain the value F in depth 1 — involving n CNOT gates. 
Increasing the depth by 1 and adding n more CNOT gates, we can also create a 'work copy' F' of F. 

8. Invoking two more multiplication circuits, we can obtain X 3 = B ■ F and AB • F' in depth L>m(^) 
with 2 • Gm(k) gates. 

9. Finally, adding AB • F' to C ■ (A' ■ X\ + B' • Yi) yields Y 3 , and this addition can be realized in depth 1 
with n CNOT gates. 

To clean up ancillae, the circuit is run backwards, excluding the final multiplications to compute Z 3 = E-Z\, 
X3 = B-F, the multiplication C ■ (A' ■ X\ + B' ■ Y\ ) , and the final addition to compute Y3 . This increases the 
overall depth by 3DA/(n) plus 4n + 0(1) (the latter accounting for CNOT gates), the T-depth by 3D^ f (n), 
the gate count by an additional 6GM(n) plus An 2 + 0(n) (the latter accounting for CNOT gates), and the 
T-gate count by 6(^(71). ■ 

3.1.2 Mixed addition with a formula by Higuchi and Takagi 

Building on earlier work by Lopez and Dahab lfl2l . in O Higuchi and Takagi suggest a method to add 
points on an elliptic curve, which requires fewer multiplications than the madd-2008-bl formulae we just 
discussed. Again, we consider the case of a point addition Pi +P2 with Pi 7^ ±P 2 and Pi / O / P2, where 
P2 is fixed. Instead of the usual projective coordinates (X, Y, Z) with x = X/Z and y = Y/Z satisfying 
Equation (T2]), Higuchi and Takagi choose a projective representation with x = X/Z and y = Y/Z 2 . The 
corresponding projective formulation of Equation (f2]) then becomes 

Y 2 + XYZ = X 3 Z + a 2 X 2 Z 2 + a 6 Z 4 , 

and the identity element O is represented by (X, 0,0) € F| n with X £ F^n arbitrary. For adding a curve 
point Pi represented in these coordinates by (Xi, Y\,Z\) G F|n to a fixed curve point P 2 given by affine 
coordinates (x2,y 2 ) £ F^n, ten F 2 » -multiplications along with nine F 2 « -additions and three squarings 
suffice. In two of the ten multiplications one operand is constant: 

A = x 2 -Z 1 , B 1 = X 2 , B 2 = A 2 , 

C = X\+ A, D = B 1 + B 2 , E = y 2 -Z 2 , 

F = Yi + E, G = F-C, 

Z 3 = Z!-D, 

X 3 = X 1 -(E + B 2 )+A-(Y 1 + Bi), 

Y 3 = (X 1 -G + Y 1 -D)-D + (G + Z 3 )-X 3 . 

Allowing an additional squaring, which does not affect the T-gate complexity, the formula for Y3 can be 
rewritten as 

Y 3 = X 1 -D-G + Y 1 -D 2 + (G + Z 3 )-X 3 . (3) 
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This latter formulation is helpful in deriving a quantum circuit with fewer T-gates and a lower depth than 
the one in Proposition 13. II 

Proposition 3.2. The point addition 

\x x ) in) \z x ) |o) |o> |o) -^|xo |n) \z x ) |x 3 ) |y 3 ) |^ 3 ) 

can be implemented in overall depth AD^in) plus An + 0(1) (the latter being CNOT gates), and T-depth 
4D^(n). Further, a total of 13GA/( n ) gates and 8n 2 + 0(n) CNOT gates suffice. The total number of 
T-gates is 13G^(n). This includes the cost for cleaning up ancillae. 

Here (X3, 13, Z3) is some projective representation of P\ + P2 as used by Higuchi and Takagi and P2 
a fixed curve point that is represented with afifine coordinates (X2, 2/2)- 

Proof: To implement the point addition formulae by Higuchi and Takagi we can proceed as follows: 

1. Using 3n CNOT gates, in depth 2 we create 'work copies' X[ of X\ as well as Z[, Z'{ and Z"' of Z\. 

2. With no more than 4 • (n 2 + n) CNOT gates, use the matrix-vector multiplications to compute A = 
Xi-Z\,B\ = X\, B2 = (x2 ■ Z[) 2 and E = yi ■ {Z'{) 2 which can be performed in parallel in depth 2n. 
To be able to compute D 2 , using 2 • (n 2 + n) CNOT gates, we also compute in parallel B 2 = (X[) 4 
and B 2 = (x 2 ■ Z'ff. 

3. Using 0(n) CNOT gates and constant depth we can now store C = X\ + A, D = B\ + B2, a 'work 
copy' D' of D, and F = Yi + E in separate registers. Moreover, maintaining constant depth and with 
a linear number of CNOT gates, we can also store E + B2, Y\ + B\, and D 2 = B 2 + B 2 ; the latter 
three values will be used for computing X3 and I3 respectively. 

4. Now, using six general ¥2^ -multipliers, we can in parallel compute G = F ■ C, Z% = Z\ ■ D, 
X 1 -(E + B 2 ), A ■ (Yi + Bi), X[ ■ D', and Y x ■ D 2 . For this, 6 • G M (n) gates and depth D M (n) 
suffice. 

5. At this point, 0(n) CNOT gates and constant depth are adequate to compute X3 = X\ ■ (E + B2) + 
A ■ (Y\ + B\) and G + Z3 and store these values in new registers. 

6. With two more multipliers that operate in parallel, (X[ ■ D') ■ G and (G + Z3) • X3 can be computed. 
Using 2 • Gm(k) gates, this can be accomplished in depth DM(n). 

1. Finally, using 0(n) CNOT gates and depth 2, with Equation ([3]> we can compute Y3 = X\ ■ D' ■ G + 

Y 1 ■ D 2 + (G + Z 3 ) ■ X 3 . 

To clean ancillae, we run the circuit backwards with the exception of the the final additions to compute Y3 
and X3 and the multipliers to compute Z3 = Z\ ■ D, (G + Z3) • X3 and A ■ {Y\ + B\). This increases the 
overall depth by 2D M {n) plus 2n + 0(1) (the latter accounting for CNOT gates), the T-depth by 2D T i (n), 
the gate count by an additional 5Gm (n) plus 6n 2 + 0(n) (the latter accounting for CNOT gates), and the 
T-gate count by (n). ■ 
Comparing Proposition 13.11 and Proposition 13.21 we see that passing from the usual projective represen- 
tation to the one used by Higuchi and Takagi results in a significant saving in the total number of gates and 
T-gates while reducing the circuit depth and T-depth. Thence, replacing the usual projective addition in 
the quadratic depth solution for the discrete logarithm problem in [13 ] with the addition discussed in this 
section is an attractive implementation option. 
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3.2 Implementing a general point addition using Edwards curves 

In view of the case distinctions in the addition law in Figure [TJ implementing a quantum circuit that properly 
handles all cases of a point addition appears to be a somewhat burdensome task: in addition to the 'generic 
case' Pi 7^ ±P 2 (with P2 not being fixed) and P\ ^ O ^ P 2 , we have to implement a doubling formula 
(Pi = P2), making sure that the identity element is handled properly (Pi = — P2, Pi = O or P2 = O). It is 
important to note here that testing the branching conditions in Figure [Qcomes at a certain cost when working 
with inversion-free arithmetic as just discussed. With projective coordinates as described in Section [3. 1.1 1 
let {X\,Y\, Z\) S F| n and (X 2 , Y 2 , Z2) £ ¥\ n be representations of two curve points Pi, P2 different from 
the identity. Checking if these two points satisfy 

Xi/Z\ = X2/Z2 ( -<=>- ^1^2 = X 2 Z\) 

requires two ¥ 2 n -multiplications — not taking into account additional gates that may be needed to clean up 
ancillae. 

Working with a different representation of elliptic curves offers an elegant alternative to dealing with 
the case distinctions in Figure [T] In [5 ], Bernstein et al. discuss a representation of ordinary elliptic curves 
over F2« which affords a complete addition law, i. e., the addition of any two curve points is handled with 
the very same formula. For n > 3 (which is especially safe to assume in cryptographic applications), each 
ordinary elliptic curve is birationally equivalent to such a complete binary Edwards curve JH. 

Definition 3.1 (Complete binary Edwards curve). Let d\,d2 € F2« with Tr^) = 1- Then the complete 
binary Edwards curve with coefficients d\ and d 2 is the affine curve defined by 

d\{x + y) + d 2 (x 2 + y 2 ) = xy + xy(x + y) + x 2 y 2 . 

We will write ~EB,d 1 ,d 2 (^''2") for the set of (¥2" -rational) points on this curve. 

The identity element of a complete binary Edwards curve is (0,0) € EB,d 1 ,d 2 (F2"), and for any two 
points Pi = (xx,yi) and P 2 = (x 2 , 2/2) in E B ,d 1 ,d 2 (F 2 n), their sum is P 3 = (x 3 ,y 3 ) with 

<h(xi + x 2 ) + d 2 (xi + yi)(x 2 + y-i) + (zi + x\ ){x 2 {yi + m + 1) + g/12/2) , 

d\ + (xi + xj)(x 2 + y 2 ) 
d\{yi + y 2 ) + d 2 {xi + yi)(x 2 + y 2 ) + (yi + yl){y2{x\ + x 2 + 1) + xix 2 ) 

di + {yi+y 2 )(x 2 + y 2 ) 

Similar to working with a short Weierstrass form, one can pass to projective coordinates to avoid costly 
inversions. In |5 ] an explicit addition formula is given to compute a representation (X3, Y 3 ,Z^) of the sum of 
two points on a complete binary Edwards curve, represented projectively as {X\,Y\, Z\) and (X 2 , Y 2 , Z 2 ). 
The formula involves 21 general multiplications in F2«, three multiplications by the parameter d\, one 
multiplication by the parameter c?2, 15 additions of ¥2^ -elements, and one squaring: 

Wi = X! + Y 1} W 2 = X2 + Y2, A = X 1 .(X 1 + Zi), B = Y 1 -(Y 1 + Z 1 ), 

C = Zi-Z 2 , D = W 2 -Z 2 , E = d ± C 2 , H = (d 1 Z 2 + d 2 W 2 )-W 1 -C, 

I = diZx-C, U = E + A-D, V = E + B-D, S = U -V, 

X 3 = S -Yi + (H + X 2 - (J + A- {Y 2 + Z 2 ))) -V ■ Z u 

Y 3 = S-X 1 + (H + Y 2 -(I + B-(X 2 + Z 2 )))-U-Z 1 , 

Z 3 = S-Zl 

These formulae can be translated into a quantum circuit for adding arbitrary (variable) curve points: 
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Proposition 3.3. Denote by {X\ ,Y±,Zi) and ( X 2 , Y2 , Z2) projective representations of two ( not necessarily 
distinct) points Pi, P2 € £?B,di,<2 2 - Then the point addition 

|Xi> \Z ± ) \X 2 ) \Y 2 ) \Z 2 ) |0) |0) |0) — ^|Xi> |n) |X 2 ) \Y 2 ) \Z 2 ) \X 3 ) \Y 3 ) \Z 3 ) 

can be implemented in overall depth 5DM(n) + 4max(DM(n), 2ri) + 0(1), where the argument 2n of 
max(-) as well as the 0(1) reflect CNOT gates only, and T-depth 9D^(n). Further, a total of '39Gm{it-) 
plus 8n 2 + 0(n) CNOT gates suffice. The total number ofT-gates is 39G^-(n). At this, (X 3 , 13, Z 3 ) is a 
projective representation of Pi + P2. This includes the cost for cleaning up ancillae. 

Proof: To implement the above addition formulae, we proceed as follows: 

1. Compute in parallel the values Wi,W2 as well as X\ + Z\ and Y\ + Z\, Y2 + Z2, and X2 + Z2 from 
the input values X\ , Y\ , Z\ , X2 , Y2 , Z2 — this can be done in constant depth using O (n) CNOT gates. 
In addition we use (depth 1) additions to |0) to create 'work copies' W' 2 of W2, Z[ of Z\, and Z 2 , Z 2 
of Z2 using 3n CNOT gates. 

2. Using four general F 2 n -multipliers and two matrix vector multiplications, compute in parallel the 
values A, B, C, D = W2 ■ Z 2 , along with d\Z 2 and d2W' 2 . As all involved multipliers operate on 
disjoint sets of wires, this can be done in depth max(DM(n), 2n) using no more than 4Gm(^) plus 
2 • (n 2 + n) gates (the latter accounting for CNOT gates). 

3. Compute (in preparation for computing H) the value d\Z' 2 ' + c^W^ an d create 'work copies' A' of A, 
B' of B, C of C, and D' of D using 0(n) CNOT gates and constant depth. 

4. Using five general ¥2" -multipliers and two matrix vector multiplications, compute in parallel the 
values E = d x C 2 , W± ■ C, A - D, B ■ D', A' ■ (Y 2 + Z 2 ), B' ■ {X 2 + Z 2 ) and d^. This can be done 
in depth max(£)Af(n), 2n) with no more than 5Gm(«) plus 2 • (n 2 + n) gates (the latter accounting 
for CNOT gates). 

5. Compute U and V and create 'work copies' U' of J7 and V of V in constant depth using 0(n) CNOT 
gates. 

6. Using five general -multipliers, find H, I, S, U'Z[ and V' Z\ using 5Gm(^) gates in depth 

D M (n). 

1. Compute I + A - (Y 2 + Z 2 ) and I + B' ■ (X 2 + Z 2 ) in constant depth using 0(n) CNOT gates. 
Moreover, generate a 'work copy' 5' of 5 using n CNOT gates and maintaining constant depth. 

8. Using four general F2™ -multipliers, compute in parallel X2 ■ (I + A- (Y2 + Z 2 )) and Y2 ■ (I + B(-X2 + 
Z2)), SXi and S'Yi, in depth Duin) using 4Gm(^) gates. 

9. Involving 0(n) CNOT gates, compute H + X 2 • (7 + A ■ (Y 2 + Z 2 )) and H + Y 2 ■ (I + B ■ (X 2 + Z 2 )) 
in depth 2. 

10. Multiply 77 + X 2 ■ (I + A • (Y 2 + Z 2 )) with U%, 77 + y 2 • (7 + B ■ (X 2 + Z 2 )) with U'Z[, and 
compute Z 3 = S ■ Z\. This can be done using 3Gm(^) gates in depth Duip)- 

11. Compute X 3 by adding S'Yi to (77 + X 2 ■ (7 + A • (Y 2 + Z 2 ))) • V'Zi and y 3 by adding SX t to 
(77 + y 2 • (7 + B • (X 2 + Z 2 ))) • C/'Z( in depth 1 using 0(n) CNOT gates. 
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The above circuit has depth 3Z?A/(ra)+2 max(-D_&f (n), 2n)+0(l) with the argument 2n of max(-) as well as 
the 0(1) originating in CNOT gates. The number of gates is bounded by 21Gm (n) plus 4n 2 +0(n) CNOTs. 
'Uncomputing' auxiliary qubits by running the circuit backwards — with the exception of the multiplications 
Z 3 = S-Z 1 ,H + Y 2 -{I + B'- (X 2 + Z 2 )) ■ U'Z[, H + X 2 -(I + A-(Y 2 + Z 2 )) ■ V'Z U and the final 
additions to compute X% and Y3 — yields the desired bound. ■ 
Making use of the (linear-depth and polynomial-size) multiplication circuits in [1], for asymptotic pur- 
poses we obtain the following corollary from the above proposition. 

Corollary 3.1. Two points on an Edwards curve in projective representation can be added in linear depth 
with a polynomial- size quantum circuit. 

Proof: This follows immediately from the multiplier architectures described in [ 1 ], which have linear depth 
and involve only a polynomial number of gates. ■ 

4 Conclusion 

The circuits for binary elliptic curve arithmetic we have presented here are most likely not 'optimal' yet, 
but they give ample evidence that incorporating results from the classic elliptic curve literature in quantum 
circuit design is worthwhile: it is possible to bring down the number of gates and T-gates that need to be 
protected against errors and it is possible to reduce the overall circuit depth and T-depth. We hope that our 
results stimulate follow-up work on the design of efficient quantum circuits for elliptic curve arithmetic — 
including the case of fields of odd characteristic. For adequately evaluting the cryptanalytic potential of 
quantum computers, this appears to be a fruitful and important research avenue. 
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